Goto

Collaborating Authors

 benjamin bengfort


Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning: Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda: 9781491963043: Amazon.com: Books

#artificialintelligence

In this book, we focus on applied machine learning for text analysis using the Python libraries just described. The applied nature of the book means that we focus not on the academic nature of linguistics or statistical models, but instead on how to be effective at deploying models trained on text inside of a software application. The model for text analysis we propose is directly related to the machine learning workflow--a search process to find a model composed of features, an algorithm, and hyperparameters that best operates on training data to produce estimations on unknown data. This workflow starts with the construction and management of a training dataset, called a corpus in text analysis. We will then explore feature extraction and preprocessing methodologies to compose text as numeric data that machine learning can understand. With some basic features in hand, we explore techniques for classification and clustering on text, concluding the first few chapters of the book.


Data Analytics with Hadoop: An Introduction for Data Scientists: Benjamin Bengfort, Jenny Kim: 9781491913703: Amazon.com: Books

@machinelearnbot

It is a great overview of a plethora of topics around doing scalable data analytics and data science. It is extremely up-to date, going through techniques that have existed for many years now like MapReduce, but also newer systems like Spark, all in the context of the Hadoop eco-system. They go into machine learning techniques, data management, and overall paint a nice picture around what data science is, and why data products are important, while teaching you how to make them! Every single concept is explained in a clear and concise manner, and wherever details are omitted there is always a citation to a source where the reader can continue reading more about it, which I think is great. Although I wouldn't classify myself as a beginner, I believe it is friendly to both professionals and beginners, as it is centered around python which makes most examples (that are conveniently uploaded in a nice github repository) really easy to simply run and play around with.